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1. Introduction and the main result 

Let 5* be a random variable with mean ji and variance cr^. Tlie Berry-Esseen tlieorem gives 
a bound on tlic Kolmogorov distance between C{S) and N[^,a'^) when 5 is a sum of inde- 
pendent random variables. Throughout this paper, let c denote absolute constants, and let 
I • I denote the Euclidean norm or cardinality. 

Theorem 1.1. [Berry [5], Esseen [11]] Assume S — X]"=i ^« where {Xi,...,Xn} are 

independent random variables with EX; = fii, VarX^ = af , E\Xi — = 7.^. Let f^i = 
Er=i Mi, = X;r=i erf , 7 = Er=i a- ^ken, 

dK{C{S),N{^i,cj^))<c^/a^ (1.1) 

where 



dK{C{X),C{Y)) = sup|P(X < 2) -P(y < z)\ 
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From (1.1), if tr^^ — 0{l/n) and 7 = o{n'^^'^), then 

dK{C{S),N{fi,a^ j) ^0 as n ^ 00. (1.2) 

A stronger distance, total variation distance, between two distributions is defined as 

dTvmx),C{Y)) = sup \F{X e A)-F{Y e A)\. (1.3) 

If S is integer valued, the convergence in (1.2) is no longer valid under total variation distance 
because 

dTv{C{S),N{fi,a^)) = l V7i>l. (1.4) 

Equation (1.4) follows by taking A to be the set of integers in the definition of total variation 
distance. Therefore, we need to find alternatives to N{^, cr^) if small total variation distances 
are desired. Several alternatives have been studied, e.g., translated Poisson distribution ([18], 
[19]), shifted binomial distribution ([20]) and a new family of discrete distributions ([14]). 
Inspired by the idea of continuity correction, Chen and Leong [7] studied a more natural 
limiting distribution, discretized normal distribution iV'*(/i, cr^), which was defined to be 
supported on the integer set Z and have probability mass function at any integer z e Z as 

nz-l<z^^^2 <z+^) (1.5) 

where ^^.^a is a Gaussian variable with distribution N{^,a'^). Using the zero-bias coupling 
approach in Stein's method, Chen and Leong proved a bound on the total variation distance 
between the distribution of a sum of independent integer valued random variables and a 
discretized normal distribution. Their result is also presented in Theorem 7.4 of [6]. 

In this paper, we adopt a different approach to deriving bounds on the total variation 
distance to the discretized normal distribution for general integer valued random variables 
by Stein's method. Stein's method was introduced by Stein [22], and has become an im- 
portant approach in proving distributional approximations because of its power in handling 
dependence within random variables. We refer to [1] for an introduction of Stein's method. 
Recently, Chen and Rollin [8] introduced a general framework, Stein coupling, under which 
normal approximation results can be proved. 

Definition 1.2. Let S be a random variable with mean /i. We say a triple of square-integrable 
random variables {S,S',G) is a Stein coupling if 

E{G/(5')-G/(5)}=E(5-m)/(5) (1.6) 

for all f such that the above expectations exist. 
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The above definition is adapted from [8] and includes many of the coupling structures 
employed in Stein's method such as local dependence, exchangeable pairs, and size biasing. 
The following theorem is our main result, the proof of which is presented in Section 3. 

Theorem 1.3. Let S be an integer valued random variable with mean fi and finite variance 
. Suppose we can construct a Stein coupling {S,S' ,G) so that (1.6) is satisfied. Then, with 
D = S'-S, 

dTv{C{S),N%pi,cj^)) 



<A^ V..-|E,GI>|5)) .^/|S^ + ^ (1.7, 

(7^ V 8 0-^ o-^ 



{\GD^\ + \GD\)dTv{C{S\F), C{S + 

where T is a a-field such that B{G,D) C J- where B{-) denotes the a-field generated by a 
random variable. 

Remark 1.4. The discretization defined in (1.5) has no loss of generality. For example, one 
may define another discretized normal distribution N'^{ii,a'^) with probability mass function 
at z as 

nz < z^,,2 < z + 1). 

Then, 

dTv{N%ii,a^).N\ti,a^)) =dTv{N%^i,a^),N%^i-\,a^)) 

<dTv{N{ii,a^),N{ii~]^,<j^)) 
<c/a. 

It can be seen from (3.8) in the proof of Theorem 1.3 that the bound (1.7) will only differ by 
a constant if one changes the limiting distribution from N'^{iJ.,a'^) to N'^{ii,a'^). 

Remark 1.5. The first three terms in the bound (1.7) are comparable to those appearing 
in the upper bounds of the Kolmogorov or Wasserstein distance for normal approximations 
(see, e.g.. Corollary 2.2 of [8]). The last term in the bound (1.1) arises because we are 
working in the total variation distance. It is easy to see that such a term must appear by 
considering the case when S has support restricted to the even integers. Intuitively, the bigger 
T is, the larger dxv (^C{S\T\ HiS + 1|-^)) becomes. On the other hand, it is easier to bound 
dTv{^iS\J^), C{S + given more information. 

RoUin and Ross [21] provided a general method to bound c?tv'('C(F),£(F + 1)) for a given 
integer valued random variable V. 
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Lemma 1.6. [Rdllin and Ross [21]] For a given integer valued random variable V , if we can 
eonstruct an exchangeable pair (F, V) (i.e., C{V, V) = £(!/', V) ) so that P{V-V' = 1) ^ 0, 
then 

dTv{c{v),ciy + 1)) 

v/Var(E(/(\/ -V ^ l)\V)) + ^\&t{¥.{I{V -V = -l)\V)) 
- P{V -V = 1) ' 

We first apply Lemma 1.6 to sums of independent integer valued random variables to 

recover Proposition 4.6 of [3]. 

Proposition 1.7. Let S = X]r=i -^'i- w^^re {Xi, X2, . . . , Xn} are independent integer valued 
random variables. Then 

dMCis), as + D) < \/^«^^(i„,,,(/(x,),/:(x. + i)))- 

We defer the proof of Proposition 1.7 to Section 3. Lemma 1.6 can also be applied when V 
is not a sum of independent random variables. There are several general methods to construct 
exchangeable pairs in the literature of Stein's method. 

Functions of independent random variables. Let S = f{Xi, . . . , X„) be a random variable 
where {Xi, . . . , Xn} are independent. Let / be a uniform random index from {l,...,n}, 
independent of {Xi, . . . , X„}. Given /, let X'j be an independent copy of Xj. Then {S, S') 
is an exchangeable pair where 

5" = fiXi, . . . , X'j, . . . , X„). 

Reversible Markov chains. Let {Alt : t = 1, 2, . . .} be a reversible Markov chain starting 
from its stationary distribution. Then (Alt, Mt+i) is an exchangeable pair. 

Local dependence. [Reinert [16]] Let S ~ -^i ^ sum of locally dependent random 

variables, i.e., for each i G {1,2, ... ,n}, there exists Ai C {l,2,...,n} such that Xi is 
independent of {Xj : j ^ Ai}. Let / be uniformly chosen from {1, 2, . . . , n} and independent 
of {Xi, . . . , Xn}. Given /, let Xj be an independent copy of Xj, and let {X'j : j G Aj,j 7^ /} 
be independently generated from C{{Xj : j £ Aj,j ^ L}\X\,Xk : fc ^ Ai). Let 

Then {S,S') is an exchangeable pair. 

In the next section, we show the utility of Theorem 1.3 by adapting it to local depen- 
dence, exchangeable pairs, and size biasing, and bounding the total variation distance for 
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discrctizcd normal approximations for 2-runs in a sequence of i.i.d. Bernoulli random vari- 
ables, the number of vertices with a given degree in the Erdos-Rcnyi random graph, and the 
uniform multinomial occupancy model. In Section 3, we give the proofs for Theorem 1.3 and 
Proposition 1.7. 



2. Applications 

In this section, we apply Theorem 1.3 to prove discretized normal approximation results for 
integer valued random variables with different dependence structures. 



2.1. Local dependence 



Let S ~ X]r=i ^ sum of integer valued random variables with EXi = /i;, /i = X]r=i 

and Var(S') — . Suppose for each i G {1, 2, . . . , n}, there exist neighborhoods Ai,Bi C 
{1, 2, . . . , n} such that Xi is independent of {Xj : j ^ Ai}, and {Xj : j <E Ai} is independent 
of {Xj : j ^ Bi}. It can be verified as in Section 3.2 of [8] that 

{S, S', G) = {S,S-Y^ - M,), ~n{Xi - ^u,)) 

is a Stein coupling where / is a uniform random index from {1, 2, . . . , n} and independent of 
{Xi^X2, ■ ■ ■ ,^n}. Applying Theorem 1.3, we have the following corollary. 

Corollary 2.1. Under the above setting, assume that for every i ^ {l,2,...,n}, \N{Bi)\ < 9 
where N{B,) = {j e {1, 2, . . . , n} : Aj n B, ^ 0}. Let 

Xi - fit _ T^jeAi (Xj - l-h) 



Then, 



6 



dTviCiS),N''{ji,a^)) 



Vi 



< 2 



Tl 1 Ti TL 

\ 1=1 ' i=l \ 1=1 

1 " 

- Y^[i\^^m\ + amvf\)dTv{C{S\TO, C{S + 1|J^0) 



(2.1) 



i=i 

where Ti is a a- field such that B{Xj : j G Ai) C J^i- 

Proof. Let / be a uniform random index from {1, . . . , n} and independent of {Xi, . . . , X„}. 
Let G = —n{Xi — fij), D = — Y^j^Aii-^j ~ Mi)' ^'^^ l*3t A^. = {Xj : j £ Ai}. We calculate 
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the terms in the bound (1.7) as follows. From the definitions of neighborhoods Ai,Bi and 
the inequality Cov{X, Y) < {EX'^ + EY^)/2, we have 

Var(E(GD|S')) 

71 

<Var(^(X,-M,) E(^^ -^^)) 

< J2 Cov((x, - J2 - M,), (^.' - M.') E - 

i^i' :Xa ■ not independent j^-^i j' ^A^i 

^ ^ ,n{x^^^ii)Y.,<,ASx,-^l,)Y 



E {- 



2 

i.z' . not independent 



n{x,'-^^^')T.reAM3'-H')]\ 

^ 2 J 

n 

1=1 jeA, 

n 
i=l 

Moreover, 

n n n 

E\GD\ = a^^E\Cir]i\, E\GD^\ = a^^E\Cirif\, EG^D^ ^ ncr^J^^^i^f- 

i=l i=l 2=1 

The corollary is proved by applying the above bounds in (1.7) with T ~ B{I,Fi). □ 

We remark that in the case that 5 is a sum of independent integer valued random variables, 
a modification of the arguments from intermediate terms in the proof of Theorem 1.3 yields 
a result similar to Theorem 7.4 of [6]. 

2.1.1. 2-runs 

We provide a concrete example here. Let Ci, ^2, • . • , Cn be independent and identically dis- 
tributed Bernoulli variables with P(Ci = 1) = 1 — P(Ci = 0) = p. Suppose n > 7. Let 
S = ^^=iXi where Xi = QQ^i. Here and in the rest of this example, indices outside 
{1, 2, . . . ,n} are understood as one plus their residues mod n. We can apply Corollary 2.1 
with Ai ^ {i — + 1}, Bi = {i ~ 2, . . . ,i + 2}, so that 9 = 7. The mean and variance of 
S can be calculated as 

fi = ES = np^, = Var(S') = n{p^ + 2p^ - 3p^). 



ims art -generic ver. 2011/11/15 file: Disc.tex date: April 3, 2012 



X. Fang/Discretized normal approximation 7 

Applying (2.1) with J'i = B{C,i-i, C,i, d+i-, C%+2) ■, along with the upper bounds < 1/cr, \rii\ < 

drviCiS), N<iiti, a^)) < + c;dTv{C{V), C{V + 1)) 

y n 

where c^, are constants depending on p and with m = n — 4 and a, G {0, 1} given, 

m 

Regarding = /(Ci,---,Cm), we define T^' = /(Ci, . • . , C|, • ■ • , Cm) where / is uniformly 
chosen from {1, 2, . . . , m}, independent of {Ci, . . . , Cm} and given /, C| is an independent 
copy of Ci- Then {V,V') is an exchangeable pair and 

E(/(v^-y' = i)|{Ci,...,Cm}) 



i-p 



m—1 



[/(a + C2 = 1, Cl = 1) + Hb + Crn-l = 1, Cm = 1) + ^ HQ-l + C+l = 1, C. = 1)]- 



i=2 

It is easy to verify that 

2(n-6) 2,-, s2 



P(F - 1) > ^ j^p^il - p) 

n — 4 



and 



v/Var(E(J(F - = l)|y)) < v/Var(E(/(l^ - - l)|{Ci, . • . ,Cm})) 
<i-|v/3(^. 



Similarly, 



v/Var(E(/(y-F' = -l)|y))< — ^v/3(n-4). 

n — 4 



From Lemma 1.6, 



d.,(/:(nA^ + i))< ^^^^ 



2(n-6)p2(l -p)2- 
Therefore, we have proved the following proposition. 

Proposition 2.2. For the above defined S , we have 

dTv{C{S),N''{^,,a^)) <Cp/V^ 

where Cp is a constant depending on p. 

We remark that the above argument also applies to fc-runs with k > 2. 
Total variation approximation for 2-runs was studied by Barbour and Xia [3] and Rollin 
[18] using the translated Poission approximation. Barbour and Xia [3] assumed some extra 
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conditions on p to obtain a bound on the total variation distance between C(S) and a 
translated Poisson distribution. Although the result in [18] applies for all p, the approach 
used was different from ours. 



2.2. Exchangeable pairs 

Stein [23] introduced the exchangeable pair approach in Stein's method. Let (S, S') be an 
exchangeable pair of integer valued random variables with ES* = /i. Var(S') = a^. Suppose 
we have the following approximate linearity condition 

E{S - S'\S) = X{S - i^i) + aE{R\S). (2.2) 

A simple modification of Theorem 1.3 yields the following corollary. 
Corollary 2.3. We have 
dTv{C{S),N\^i,a^)) 



^ /F ^ VEJ!^ ^ v/Var(E((y-5)2|^)) ^ [^ E\S' - S\^ ^ ^E\S' - S\^ 
~ y 2 A Acr^ V 8 2\a^ 2\a^ 



4A(t2 



{\s' - + {s' - sy)dTv{c{s\T), c{s + 1| J-)) 

where T is a cr-field such that B{S' — S) d T . 

Proof. We follow the proof of Theorem 1.3 with minor modification. Let G — ^{S' — S) 
and D = S' ~ S. From (2.2) and the exchangeability of {S, S'). 

E{S - fi)fiS) = E{Gf{S') - Gf{S)} - jEf{S)R. 

Therefore, (3.5) has an extra term aEfh{S)R/ X, which is bounded by ■\/7r/2E|i?|/A from 
(3.3). Moreover, from (2.2), EGD = a"^ + aE{{S - iJ.)R)/\. Hence instead of (3.6), 



< ^{VVMHGD\S)) + ^E\{S < v/Var(E((y ^ 

" A AC A 



Corollary 2.3 follows from Theorem 1.3 and the above arguments. □ 



If the exchangeable pair {S,S') satisfies that [S" — S"| < 1. we have the following corollary. 

Corollary 2.4. Let {S, S') be an exchangeable pair of integer valued random variables sat- 
isfying the approximate linearity condition (2.2). In addition, suppose \S~S'\ < 1. Then we 
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have 



dTv{C{S),N''{fi,a^)) 



Proof. Let G = ^(S" - S"), 13 = 5' - S*. Then ioTheU as defined in (3.1), 
EG / {h{S + t)- h{S))dt 



= 7^E(5' - 5*) / (/i(5 + t) - h{S))dt 

2A ./n 



1 

{h{S + t)- h[S))dtI{S' -5 = 1) 



-1 

{h{S + t)- h(S))dtI{S' 



(2.5) 



= -JrE[(/i(S' + 1) - h{S))I(S' -5 = 1) + {h{S - 1) - h{S))I{S' - 5 = -1)] 
4A 

= i-E[(/i(5') - /i(5))/(5' -5 = 1)- (/i(5) - /i(5'))/(5 - 5' = 1)] 
4A 

= 0. 

We used the exchangeability of (5,5') in the last equality. From (2.5), the upper bound in 
(3.8) can be replaced by 0. Therefore, the bound on drvi^iS), N'^{fi,a^)) can be deduced 
similarly as in the proof of Corollary 2.3 except that we do not have the last term of (2.3). 
□ 



Remark 2.5. Under the condition of Corollary 2.4, Rdllin [19] obtained a bound on the 
total variation distance between C{S) and a translated Poisson distribution. His result, to- 
gether with the triangle inequality and easy bounds on the total variation distance between 
the translated Poisson distribution and the discretized normal distribution, yields a similar 
bound as (2.4). 

Remark 2.6. Exchangeable pairs of integer valued random variables (5, 5') such that \S' — 
5| < 1 are commonly seen in the literature, e.g., binary expansion of a random integer 
[Diaconis [10]], anti-voter model [Rinott and Rotar [17]]. Corollary 2.4 shows that under 
this special assumption, bounding the total variation distance requires no more effort than 
bounding the Kolmogorov distance. 
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2.3. Size biasing 

Let 5 be a non-negative integer valued random variable with mean and let have the 
S'-size biased distribution, i.e., 

ESf{S) = EnfiS") 

for all / such that the above expectations exist. If S"* is defined on the same probability 
space as S, then 

{S,S',G) = {S,S'',fi) (2.6) 

is a Stein coupling. Size biasing was first introduced in the context of Stein's method by 
Goldstein and Rinott [13]. Theorem 1.3 has the following corollary for size biasing which 
easily follows from (2.6). 

Corollary 2.7. Let S be a non-negative integer valued random variable with mean fi and 
finite variance . Let be defined on the same probability space and have the S-size biased 
distribution. Then 

dTv{C{S),N''{fi,<j^)) 

< ^ VVar(E(5^^ - S\S)) + \[l^E\S' - S\^ + -^^^3^ - S\^ (2.7) 

+ £^^[{\S' - + \S' - S\)dTv{C{S\J'),C{S + 1|.F))" 
where T is a a-field such that BIS" — S) d T . 



2.3. L Number of vertices with a given degree in the Erdos-Renyi random graph. 

Let G{n,pn) be an Erdos-Renyi random graph with vertex set {1, 2, . . . , n} and edge prob- 
ability p„. Let Sn be the number of vertices with a given degree d > in G{n,pn). The 
asymptotic normality of Sn was proved in [2] when npn 9 > 0. Under the conditions 

3 < 61' < 6*" < oo s.t. d' < dn < 6" 
and pn = On/{n - 1) G (0, 1) for ah n > 2, (2.8) 

Goldstein [12] proved a bound on the Kolmogorov distance between the distribution of Sn 
and Ar(/i„,cr2)^ 

dK{C{Sn), N(pn,al)) < Cd/Vn 

where and cr^ are the mean and variance of Sn respectively. Here and in the rest of 
this example, let Cd = c{d,d',6") denote positive constants which may depend on d,9',6". 
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In the following proposition, we prove a bound on the total variation distance between the 
distribution of Sn and N^{^n, o-fi)- 

Proposition 2.8. With Sn defined above and assuming (2.8), we have 

dTviC{Sn),N''{pn,al)) < ca/V^. 



Proof. In [12], it was proved that under condition (2.8), 

n 2 

— < Mn^cr„ < c^n. 

Cd 

Let deg{i) denote the degree of vertex i. Then Sn can be expressed as 



5„ = ^/(deg(j)=d). 

i=l 

Following Goldstein and Rinott [13], let / be uniformly chosen from {1, 2, . . . , n} and inde- 
pendent of G{n,pn). If deg{I) = d, then we define G'^{n,pn), the size biased graph, to be 
the same as G(n,p„). If deg{I) > d, then we obtain G'^{n,pn) from G(n,p„) by removing 
deg{I) — d edges chosen uniformly at random from the edges that connect to / in G(n,p„). 
If deg{I) < d, then we obtain G'^{n,pn) from G{n,pn) by connecting / to d — deg{I) vertices 
chosen uniformly at random from those not connected to / in G(n,p„). Let Sn be the number 
of vertices with degree d in the graph G^{n,pn). It was proved in [13] that Sf^ has the S'„-size 
biased distribution and 

V^r{E{S:,-Sn\Sn))<Cd/n. 
From the construction of G*(n,p„), 

\S:,-Sn\<\deg{I)-d\ + l. (2.9) 

From (2.8), for any positive integer k which is bounded by an absolute constant, 

Edegil)'' < Cd- (2.10) 

Therefore, 

By Corollary 2.7, the proof will be complete after we show that 

e[(|5^ - 5„|2 + 15;, - Sn\)dTv{C{Sn\J'),C{Sn + 1|.F))] < Cd/V^ (2.11) 

for a CT- field T such that 6(5*^ — 5„) C J-". For a given /, define 

Ai = {/} U {j : eij = 1 or ej_,- = 1}, Bj = {k Aj : ekj = 1 for some j € Ai} 
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where Cuv i^uv) is the indicator that there is an edge connecting u and v in G{n,pn) 
(G^(n,p„)). Let 

F = Ai, Bi, {euv -.uG Ai,v € AiU Bi}, {ef„ : v £ Ai}). (2.12) 

From the construction of G^{n,Pn), we have B{S^ — Sn) C J-. From (2.9) and \Ai\ = 
max((ie(7(/), d) + 1, 

<2E(|de,g(/) - + l)2/(max(de5(/), d) + 1 > ^/^) 
<2E(|deg(/) - d| + l)2(max(fie<7(/), d) + 1)/Vn 



where we used (2.10) in the last inequahty. Similarly, 

E(|5^ - 5„p + - Sr.\)I{\Bi\ >^)< 2E|A,|2|5,|/V^ 
<M\Ai\^{¥.^-^'Bi)/^ < CdE\Aif/V^ < Cd/V^. 
Therefore, to prove (2.11), we only need to prove 



E 



(|5,t - Sn\^ + - 5„|)/(|^/|, \Bi\ < V^)dTviCiSn\F),CiSn + l\F)) < Cd/V^ (2.13) 



where J- was defined in (2.12). Given F with jiJ/j < y/n, we define a random graph G-^ 
with vertex set {1,2,..., n} by letting e^^ = e„„ for u E Ai,v E {1, 2, . . . , n}, and letting 
e^j, be independent Bernoulli{pn) random variables for u,v E (^4/)"^ where e"^ is the edge 
indicator for G'^. Let V-^ = X^iLi ^{deg^ii) = d) be the number of vertices with degree d 
in G^. Then £(1/-^) = £(S'„|J"), which follows from C{G^) = £(G(n,p„)| J"). 
In the following we fix a given J- with |_B/| < -^/n, and prove 

dTv{C{V^).C{V^ + 1)) < Crf/V^. (2.14) 

For ease of notation, we suppress the superscript J^, i.e., let G = G^^, = V'^, e = e-^, deg = 
deg-^. To bound dTv{C{V), C{V + 1)), we uniformly choose J ^ K from G/ (A/ U B/)^ 
and resamplc cjk to be e'^^ with the same probability p„; thus obtain an exchangeable pair 
(y,y). To apply Lemma 1.6, we first express 

I{V -V' = 1) 

=ejK{l - e'jK){I{deg{J) = d,deg{K) ^ d,d + 1) + I{deg{J) ^ d,d+l,deg{K) = d)} 
+ (1 - ejK)e'jK{I{deg{J) = d, deg{K) ^d-l,d)+ I{deg{J) 7^ d - 1, d, deg{K) = d)}. 



imsart-generic ver. 2011/11/15 file: Disc.tex date: April 3, 2012 



X. Fang/Discretized normal approximation 13 

Then, with m = \Ci\> n — 

nV -V' = l) >E(1 - ejK)e'jKl{deg{J) = d)I{deg{K) ^d-l,d) 

^—^ (1 - Pn)Pnmi = d)¥{^2 ^d-l,d) 

>Cd/n 



m(m- 



where ^i, ^2 are independent random variables with distribution Binomial(|i?/| + m~ 2,p„). 
Next, we obtain an upper bound of Var(E(/(y - V ^ ^W))- Note that 

Var(E((l - ejK)e'jKl{deg{J) = d)I{deg{K) ^d-l,d)\V)) 
<Var(E((l - ejK)e'jKl{deg{J) = d)I{deg{K) ^ d - 1, d)\G, T)) 
E (1 - e^kW^kHdegij) = d)I{deg{k) ^d-l,d) 

],k£Ci:j^k 



J2 Cov[(l ~ ejk)e'jj{deg{j) = d)I{deg{k) ^d-l,d), 



c 

j,k.jl ,k'eCj: 
jz^k,j'^k' , \j,k.j' ,k 1=2 



(1 - e,,k')e'^.^,,I{deg{j') = d)I{deg{k') ^d-l,d) 
+ 4 E " e,fc)e;,/(deg(j) = d)I{deg{k) ^d- l,d), 



j.kj' ,k' £Cj: 
j^k,i'^k' .\i,k,i' ,k'\=l. 



c 

i,fc,j',fc'eCj: 
\j,k,j' ,k'\=i 



(1 - ej,k')e'yk'I{deg{3') = d)I{deg{k') ^d-l,d) 
J2 Cov[(l - ejk)e'jul{deg{j) = d)I{deg{k) ^d-l,d), 



(1 - e,,k')e'^,,,Iideg{f) = d)Iidegik') ^d-l,d) 

Since Ee^^. < Cd/n, the first two terms in the above bound are bounded by Cd/n^- For the 
last term, let C be the event that there is no edge connecting {j, k} and {j', k'} and define 

a = E[(l - ejk)e'^JidegU) = d)I{deg[k) ^d-l,d)\C], 

P = E[(l - e,k)e'jj{deg{j) = d)I{deg{k) ^ d - 1, d)]. 
Then for |j, fc, /, = 4, 

Cov[(l - e,k)e',kI{degU) = d)I{deg[k) ^ rf - 

(1 - e,,k')e',,k'I{deg{f) - d)I{deg{k') ^d-l,d) 
<\{a + /3)(a - + Cd/n^ < Cd\a - /3|/n + Cd/n\ 
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Let R be the event {cjk = 0}, then 



a - /3 =—(1 - — )<^ E I{deg{j) = d)I{deg{k) ^d-l, d)\R, C 
71 n I I 

-E\l{deg{j) = d)I{deg{k) ^d~l,d)\R 



By a shuple couphng argument, |q! — /3| < Cd/n^ ■ Therefore, 

V&Y{¥.{{l-ejK)e'jKl{deg{J) = d)I{deg(K) ^d-l,d)\V)) < Cd/rv". 

After bounding the variances of the other terms appearing in E(/(V^ — V'~ ^)\V) by the 
same argument, we conclude that 

Var(E(/(y -V' ^ l)\V)) < ca/n^. 

Similarly, 

Var(E(/(y -V ^ -1)1^^)) < Cd/n^. 
By Lemma 1.6, we obtain (2.14), which yields (2.13). □ 



2.3.2. Uniform multinomial occupancy model 

We consider the uniform multinomial occupancy model studied by Bartroff and Goldstein 
[4], to which we refer for the literature on this and related problems. Let 7i>(i>2,m>2be 
positive integers. Let S be the number of urns having occupancy d when n balls are uniformly 
distributed among m urns. In [4], a Berry-Esseen bound on the Kolmogorov distance between 
the distribution of S and a normal distribution was proved as 

d,iC[SlN[,^a-))<'J^±M^ 

a 

where /i, cr^ are the mean and variance of S given by 

a^^^i-^Ji^ + m(m - 1) f , " , ,) 4^(1 ' -T-^'l{n > 2d) (2.16) 

\a,d,n — MJ m'^'^ m 

and Cd is a constant only depending on d. Applying Corollary 2.7, we prove a bound on the 

total variation distance between the distribution of S and N'^{ii, a^). 



imsart-generic ver. 2011/11/15 file: Disc.tex date: April 3, 2012 



X. Fang/Discretized normal approximation 15 

Proposition 2.9. Let ?? > > 2, m > 2 he positive integers. Let S be the number of urns 
containing d balls when n balls are uniformly distributed among m urns. Then, with /i, 
given by (2.15), (2.16), we have 

dMCiS).N\,^a^))<'-^^^^±M^ (2.17) 

a 

where Cd is a constant only depending on d. 

Proof. Wc follow the construction of size bias coupling in [4]. For a given i £ {1,2,..., m}, 
we define m-dimensional random vectors M„,M^ as follows. Let < M >i be the vector 
obtained by deleting the ith component of M. Firstly, wc define the ith components of 
M„,M5j to be Mn{i) ^ Binomial{n,l/m), Mj^{i) — d. Next, let j,R5j be conditionally 
independent given M„(i) such that given M„{i), ^(i) = 0, i?^(i) = and 

C{< j >i \Mn{i)) — Multinomial(n — max{A/„(i), d}, m — 1) 

and 

£(< Rjj >i |M„(i)) = Multinomial{\d - Mn{i)\,m - 1). 

Finally, let 

< M„ >,=< M; >, +L{Mn{i) <d)< Rj, >, 

and 

< M; M;_, >, +L{M„{i) >d)<R.l >, . 
From the above construction, 

£(M„) = Multinomial{n,m), £(Mj,) = £(M„|M„(i) = d). 

Therefore, the number of urns having occupancy d in the uniform multinomial occupancy 
model can be written as 

m 

Define 

m 

5^=^/(A4^(j) = d) 
i=i 

where / is uniformly distributed over {1, 2, . . . , m} and independent of all other variables. 
It was proved in [4] that S'^ has the S'-size biased distribution. We are now ready to apply 
Corollary 2.7. In the rest of this proof, let Cd denote absolute constants which may depend 
on d. 
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To prove (2.17), wc can assume i_^_f^n/m)^ — '^'^nie given constant depending 

on d. In particular, from Lemma 3.1 of [4], given any n*,m*, can be chosen such that 
> Td imphes 

n > n*, m > m*, n < 2TOlogm, < c^, cr^ < CdU. (2-18) 

Moreover, it was shown in [4] that by choosing n* big enough and modifying the vahie of 
so that (2.18) is satisfied. 



1 + {n/m) 



3 



VVar(E(g^ - S\S)) < Cd ^ ' ' . (2.19) 
From the bounds on the moments of binomial distributions and 

\S-' - S\ < \AUI) - d\ + 1, (2.20) 

we have 

E\S' ~ SI" <cd{l + {-)"), fc = 1,2,3,4. (2.21) 
m 

The first three terms on the right-hand side of (2.7) are bounded by Cd from (2.18), 

(2.19) and (2.21). Therefore, to prove Proposition 2.9, we only need to show that 

(1^^ - Sf + \S' - S\)dTvmS\T),C{S + < c^l±i^}Ml (2.22) 

for a fT-field T such that B{S'' — S) C Such a tr-field can be chosen as 

J- = 6{/,M„(/),R^,{M„(j) :i?^(j) >0}} 

from the constructions of M„ and M^. Write 

e[(|5^ - + \S' - S\)dTvmS\J^),C{S + 1|J")) 

= E[(|5^-5|2 + |5' -5|)/(Af„(/)+ M,,{j) > V^)dTv{C{S\T),C{S + 

+ e[{\S' - 5p + 1^-' - S\)I{AUI) + ^-^"(J) ^ MdTv{C{S\T),C{S + l\T)) 

r-K (j)>o 

(2.23) 

For the first term on the right-hand side of (2.23), we bound the total variation distance by 
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1 and apply (2.20), (2.21), 

e[{\S'~S\^ + \S'~S\)I{M,,{I)+ M^{j)>V^)dTv{C{S\J^),C{S + l\T)) 

r-R'Jj)>o 

< 4=E(|A/„(/) -d\ + lf{Mn{I) + V A/„(j)) 
^ 3--RiU)>o 

1 + (n/m)" 

< Cd 1= . 

(2.24) 

For ri < m, wc have cr^ < from (2.18). For n < 2m log m, we have (see equation (26) and 
3 of Lemma 3.1 in [4]) 

< Cdm(-)'*e-"/". (2.25) 
m 

Therefore, (2.24) is bounded by Cd/cr. 

To bound the second term on the right-hand side of (2.23), for a given J' with A/„(/) + 
'^j-R' {j)>o < let V be the number of urns containing d balls when ?ii balls are 

uniformly distributed among mi urns where 

ni^n-{AU{I)+ J2 Mr,ij))>n-V^ (2-26) 

3-R'„ (i)>0 

and 

mi = m - 1 - |{i : Ri{j) > 0}| > m - 1 - (|A/„(/) ~ d\ + 1) > m - 1 - y/^. (2.27) 

Then dTvmV),C{V + 1)) = dry (/:(5'| J"), £(5 + 1| J")). To apply Lemma 1.6, we construct 
an exchangeable pair (V, V') by picking a ball uniformly from the ??i balls and distributing 
it to an independently and uniformly chosen urn from the mi urns. Formally, let M„j be an 
mi -dimensional random vector with distribution 

£(M„j^) = Multinomial{ni,'mi). 

Given M„j, define two random variables J, K £ {1, 2, . . . , mi} with probability mass func- 
tions 

ni mi 
Given M„j , J, K, let MJ^^ be the mi-dimensional vector with 

AC^ (J) = A/„, (J) - 1, A/;^ (if) = Af„, (if) + 1 
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and M' (i) = M„, (i) for i ^ J, K. Define 



and 



From the above construction, 



E(/(y-y' = i)|M„j= 



l<.j^k<.mi 



mini 



IiM„Aj)=d)I{Mn,ik)^d-l,d) 



+ I{Mn, (j) ^d,d+ (k) = d) 



E(/(y-T/' = -l)|M„J= J2 



mini 



IiM„, {j ^d,d + l))/(M„i (fc) = d - 1) 



+ I{Mn, (j) =d+ l)I{Mn, (fc) ^ d - 1, d) 



Therefore, 



F{V -V'^l)^ V \^—F{Mn, U) = d, Mn, {k)^d- 1, d) 



+ Eil/„, ij)HMn, {]) ^d,d+ l)/(Af„, (fc) = d) 

mini 



(2.28) 



ni '™i 



where i?n,p denotes a binomial random variable with parameters n,p. We proceed to bound 
Var(E(/(y - y = 1)\V)) and Var(E(/(y ~ V ^ -1)1^))- 

Var(E(/(y-y' = 1)\V)) 
< Var(E(/(F - r = 1)|M„J) 



< 



T 2^ 2 



Var(d ^ /(Af„, (j) = d)/(A/„, (fc) ^ d - 1, d) 



(2.29) 



Var( ^ AU, {j)I{M„, [j] ^d,d+ 1)/(M„, (fc) = d) 

l<j7^^A;<mi 



Let 



ani,mi(j, fc) := /(M„,(j) = d)I{M^,{k) ^d- 
and let g {1, 2, . . . , mi} denote the location of the /th ball. Applying the arguments on 



imsart-generic ver. 2011/11/15 file: Disc.tex date: April 3, 2012 



X. Fang/Discretized normal approximation 



19 



page 17 of [4], 



Var( ^ 

l<'j^k<mi 



< TliE 



^ ^ y^'^ni ,mi ,(rii) (tAii 7 ^) Oni ,mi (f^n i j fc) 

l<A;<?rii,fc#(7„j 

l<j<T?j.i j"5^C/„j 

where ani,mi,{ni)ij^ k) is the value of ani.miOi ^) when withholding ball ni. Since 

Af„, (t/„, ) — 1 ^ BinomiaUni — 1, ), 

mi 



we have 



Var 



( X! ar,l-l,rni(j, fc) 



l<j7^^/c<7ni 



< niE^ [liMnAUm) = d + 1)/(M„, (fc) ^ d - 

- /(M„, ([/„, ) = d)/(A'/„, (fc) ^ rf - 1, d) 
+ 5^ [/(M„, (j) = d)/(Af„, ([/„ J ^ d, d + 1) 

- /(M„, (j) = d)/(A/„, (C/„J ^ d - 1, d) 

< CdnimlP{B^^ _j_ = d - I or d). 



Let 



(2.30) 



fcni.mi (j, fc) A/„, (j)/(A^ni (j) ^ d, d + l)/(Af„, (fc) = d). 
By the same argument as for yai(^J2i<j^k<vn 0' 

Var(^ X! ^ni-l,mi(i,fc) 
l<j^k<mi 

<niE| ^ [(Af„,(f/„J-l)/(A/„,(C/„J^d+l,d + 2)/(A/„,(A;) = d) 

- M„, )/(Af„, )^d,d+ l)I{Mn, (k) = d) 
+ J2 [^^"1 (j) 7^d,d+ l)/(A/„, (i7„ J = d + 1) 



- A/„, (j)/(A/„, (j) ^ d, d + l)/(Af„, (Un, ) = d) 



(2.31) 



d) 



TOi mi '""i mi 
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Var(E(J(T/ -V' = 1)\V)) < ^(1 + (^)')P(5„„ 



= d — 1 or d) . 



(2.32) 



Similarly, 



Var(E(/(X^- y = -1)1^)) 



< _±p(5^^^ _^ = d or d + 1) 



(2.33) 



Cd_ 

ni 



(Z^ + (Z^)2)p(s = d - 2) + (1 + (I11)2)P(B„^.^ = d - 1) 
mi mi mi '"i 



^2^ 



Applying Lemma 1.6 with (2.28), (2.32) and (2.33), we obtain 
dTv{C{V),C{V + l)) 



M / mi mi-' 



Cd ^ V ™i ™i 



< 



<C<i(l+^/^) 



= d - 2) + (1 + ^) /P(i3„^^^ = d - 1 or d) 



= d+l) 



(l + ^)P(i3„^_^ =d) 



1 



From (2.18), (2.25), (2.26), (2.27) and 



ni—d 



> 



Cd 



> 



Cd 



(I J ]_)ni-d — g(ni-d)/mi ' 

mi ^ 



we have 



dTv(£(l/), £(F + 1)) < cd{l + J—) , ^ 

7^1 (J./e«/™(l _ ;;^)"l-'' 



(2.34) 



Cd(l + Jn/m) I ,rii - d Cd(l + Jn/m) 
< Wexp( ) < ^ . 



(T V 7711 m (T 

This, together with (2.21), proves that the second term on the right-hand side of (2.23) is 
bomided by Cd(l + {n/m)^^'^)/a. Therefore, (2.22) is proved. 

□ 



3. Proofs 



3.1. Proof of Theorem 1.3 



From the definition of N'^{iJ,, a'^), (1.5), we have 



dTv{C{S),N''iti,a^)) = sup\Eh{S)-EhiZ^,,2)\ 
hen 
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where Zfj_ ,j2 is a Gaussian variable with raean /i and variance cr^ and 

•H = {/i : R {0, 1}, h{x) h{z) when z-i<a;<z+^forzeZ}. (3.1) 
For each h €z consider the fohowing Stein equation, 

a'f'is) -{s- = h{s) - Eh{Z^^,.). (3.2) 

It is known (see [9]) that there exists a bounded solution fh to (3.2) and 



||MI<,/f^, ||//J|<^. (3.3) 



Therefore, 



dTv{C{S),N''{^l,a^)) = snp\Ea^f'^iS) -E{S ^ fi)h{S)\. (3.4) 
hen 



Since {S, S',G) satisfies (1.6), 



= Ea\a(S) - E{GhiS') - GhiS)} 

l-D (3.5) 

= Ea^r^iS) - EGDfUS) - EG / {f'^{S + t)- f;,{S))dt 

= i?1 — i?2 







where 

Ri=Ef;^{S){<j^-GD), 

R,=EG r {fl{S + t)- f\,{S))dt. 
Jo 

From (1.6), EGD = a^. Therefore, from (3.3), 



\Ri\ < — 5 • (3.6) 



For i?2, since fh solves (3.2), 



i?2 = EG / —{{S + t - fi)MS + t)^{S~ fi)h{S) + h{S + t)- hiS))dt 
Jo 

= EG / —{this + t) + iS- ii)ihiS + t)- MS)) + h{S + t)- hiS))dt. 

Jo cr^ 

Using (3.3), the first two summands in (3.7) can be bounded by 

ll^E\GD'\ + j^E\GD^iS-f^)\. 



(3.7) 
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From (3.1) and (1.3), 



D 



EG / {h{S + t) - h{S))dt 



EG 



[/(O <t<D)^I{D<t< 0)][E^{h{S + t)- h{S))]dt 
1 f°° 

<^E\G\ \l(0<t<D)-I(D<t<Q)\\E^(h(S + t)-h(S))\dt (3.8) 
<^E|G| / \liO<t<D)-IiD<t<0)\i\t\ + l)dTviCiS\J^),CiS + l\J^))dt 



< — 
Therefore, 



(|GZ?2| + \GD\)dTvmS\T),C{S + 1| J-)) 



|i?2l < 



TT 1 

8^ 
1 



E|Gi.^| + ^^ 



(3.9) 



2a2 



rE 



(|Gi^2| ^ \GD\)dTv{C{S\T),C{S + 1| J-)) 



The theoreni is proved by using (3.4), (3.5) and the bounds (3.6), (3.9). 
3.2. Proof of Proposition 1.7 

We construct an exchangeable pair {S,S') in the following way. Assume that for each i G 
{l,...,n},.7eZ, P(X, = j) Let 



We have 



aij = (py Apij+i)/2. 



E"^^- = - '^Ty(/;(X,;), £(X,; + 1))). 



Using Mineka coupling (sec [15]), let {Xi,X^) be coupled so that 

V{X, = J - 1, X: = j) = ¥{X'^ = J - 1, X, - j) - 

P(X,; = X- = j) = pij - aij^i - aij. 

Therefore, {Xi, X-) is an exchangeable pair. Let / be a uniform random index in {1, 2, . . . , n} 
and independent of {Xi, . . . , X„}, and let 

S' = S — Xj + Xj. 

Then {S,S') is an exchangeable pair. 

P(5-5' = l) = -5:P(X.-X; = l) = -^^a 



1,3 ■ 
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Moreover, 

1 " 

Var(E(7(5 - S' = l)\S)) = — Var[^E(/(X, - X[ = l)\S)\ 

<-Var[^/(X,-X; = l)] 

i=l 

■i=i 
1 " 

Similarly, 

Var(E(/(5-y = -l)|5))< ^^^a.,. 
The proof is finished by invoking Lemma 1.6. 
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